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1. INTRODUCTION 

Agriculture and farming are chiefly dependent on various weather parameters. Weather prediction is 
necessary to find out future climate changesand it plays a significant role in many sectors namely renewable 
energy, transportation, manufacturing, supply chain management, agriculture and forestry.Accurate weather 
forecasting helps farmers for suitable planning of farming operations. So, accurate weather prediction is 
necessary for the farmers to get the maximum yield in agriculture and also to prevent the crop wastage. For 
country like India, it is difficult to predict and forecast the weather parametersaccurately for various seasons 
and climates due to the complexity of weather events. In earlier days, there is no better understanding of 
weather forecasting due to the limited network of weather stations.Sometimes losses may occur in the 
agriculture crop due to the false prediction of waether. India has witnessed an increase in the mean temperature 
since the mid of 20" century. Also man-made climate change is likely to continue apace during 21* century. 
To improve the accuracy of future climate predictions, it is important to develop various approaches for 
improving the knowledge of earth atmosphere system. Researchers have developed different models to forecast 
the weather parameters commonly using random numbers and they are relatively similar to the climate data. 
Various climate models such as mathematical model [1], empirical model [2], ARMA [3], artificial 
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intelligence-based models [4]-[7], fuzzy and ANFIS [8], [9], datamining and internet of things (IoT) based 
forecasting models [10], [11] are reported in the literatures. Recently machine learning (ML) [12]-[18], deep 
learning [19] and hybrid models [20], [21] are widely applied for weather forecasting. In this research work, 
theweather prediction model is developed using machine learning algorithms like support-vector machine 
(SVM), linear regression and decision tree. The proposed method usesmeteorological data collected from few 
selected regions within India to predict the weather parameters. 


2. OVERVIEW OF MACHINE LEARNING BASED WEATHER PREDICTION MODELS 

Machine learning algorithms are categorized into twoclassifications namely supervised learning and 
unsupervised learning. The clustering algorithms come under unsupervised machine learning category. Ahmed 
and Mohamed [22] used linear regression machine learning model to estimate the rate of precipitation (PRCP). 
Srivastava et al. [23] predicted monthly precipitation using various ML algorithms such as Support vector 
machine (SVM), linear regression, artificial neural network (ANN) using back propagation and long short- 
term memory network (LSTM) for early warning oflandslide occurrence. Devi et al. [24] usedback propagation 
neural network to provideearly warning of landslideswith combination of meansquare error and correlation 
coefficient as the performance metrics. Sakthivel et al. [25] used intense neural network mining with 
combination of mean square error (MSE) and root mean square error (RMSE) to achieve preprocessed rainfall 
data with reverse mapping values. Basha et al. [26] utilized autoregressive integrated moving average 
(ARIMA) model, artificial neural network, support vector machine with combination of MSE and RMSE as 
the metrics to predictrainfall for agricultural related applications. Babu and Arulmozhivarman [27] applied 
ANN models for effective wind speed forecasting. Table 1 can be seen in appendix, presents the literature 
survey of weather prediction models along with the methodology and the performance metrices. 


3. RESEARCH METHODS 

In this section, the methodology and the results of the weather prediction models are discussed. 
Various weather parameters namelyminimum temperature (°C), maximum temperature (°C), mean temperature 
(°C), relative humidity (%), wind speed (m/s) and solar radiation data (w/m°) are used to build the machine 
learning models. Decision tree and linear regression-based ML models are used to predict the weather 
parameters namely precipitation, wind speed, relative humidity and solar radiation which play a vital role in 
agricultural field. In addition to the ML models, temperature based empirical models namely Hargreaves and 
Samani and Bristow and Campbell models are also utilized to predict solar radiation for selected Indian 
locations namely Trivandrum, Chennai and New Delhi and precipitation for Trichirapalli. Empirical model is 
mathematics intensive and is based on coefficient estimation. Prediction of wind speed and percentage relative 
humidity was done for Bangalore and New Delhi. 


3.1. Collection of data 

The data set for building the machine learning model were taken from Indian Meteorological 
Department, Pune and from AQUASTAT website. The Dataset was prepared by combining the monthly data 
of Indian locations namely New Delhi, Bangalore, Hyderabad, Trivandrum and Mumbai and eight locations 
within the state of Tamilnadu namely Chennai, Coimbatore, Kodaikanal, Madurai, Ooty, Pondichery, 
Ramanathapuram, Salem and Tiruchirappalli. The input parameters for predicting the Precipitation (mm/d) 
areminimum temperature (°C), maximum temperature (°C), mean temperature (°C), relative humidity and wind 
speed (m/s) for wind speed prediction relative humidity is excluded. The research indicated that all the above- 
mentioned input parameters have significant importance in the prediction. 

All the input parameters were put together in a comma separated values (CSV) file for forecasting of 
wind speed. The complete dataset is divided into three sets such as training set, validation set and testing data 
set. It is further processed with the help of two empirical models and ML models. Spyder an open-source cross- 
platform integrated development environment (IDE) for scientific programming in the Python languageis used 
for developing the computer codes for empirical and ML models. Further, the machine learning models were 
validated using the experimental IMD testing dataset. Table 2 presents the input parameters considered for the 
prediction of weather parameters. 


3.2. Data splitting 

The complete dataset is divided into three sets namely training, teasting and validation data set; 
i) Training Set: The training part of dataset will be of 70 percentage of total dataset; ii) Test Set: The validation 
part of the data set will be of30 percentage of total dataset; iii) Validationset: This set can be realtime data or 
recorded data from IMD, Pune by which the model’s final performance is evaluated. 
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3.3. Model selection 


Linear regression and decision tree algorithm were used for the prediction of precipitation, relative 
humidity, wind speed and solar radiation. Also, the temperature based empirical model namely the Hargreaves 
and Samani and Bristow and Campbell models are used to estimate the solar radiation. The decision tree 


algorithm was used in the prediction of wind speed. 


Table 2. Input parameters considered for the prediction of wind speed 
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Month Maximun Temperature Minimum Temperature Relative Humidity Surface pressure 

number (°C) (°C) (%) (kPa) 
1 20.64 7.28 41.28 99.13 
2 25.31 11.00 38.25 98.91 
3 31.18 15.90 27.27 98.58 
4 37.62 21.99 18.96 98.15 
5 40.50 26.30 20.99 97.69 
6 40.12 28.55 31.41 97.34 
T 36.46 28.00 56.27 97.33 
8 34.38 26.72 70.01 97.58 
9 33.80 24.60 67.58 97.98 
10 33.09 19.00 46.01 98.60 
11 28.60 13.07 38.49 98.92 
12 20.64 7.28 41.28 99.13 


3.3.1. Linear regression 


Linear regression algorithm represent a linear relationship between a dependent and independent 
variables. Since linear regression shows the linear relationship,it will value the dependent variable in response 


to the changes in value of the independent variable. Equation for linear regression (1): 


y= aotaixt € 


- Positive linear relation: The linear relation is said to be positive when dependent variable increases on an 


axis (y) so does the independent variable on an axis (x). 


- Negative linear relation: The linear relation is said to be negative when dependent variable decreases on 
an axis (y) as the independent variable getting decreased on an axis (x). Figure 1. gives the pictorial 


representation of the positive and negative linear relation. 


+ve line of regression 


The line equation will be: Y= ao+a1X The line of equation will be: Y= -ag+taix 


Figure 1. Positive and negative linear relation 


Hargreaves and Samani Model [28] 


Z = aAT™ 
H 


0 
Bristow and Campbell model [29] 


te a(1 — exp(—bAT)°) 


Ho 


where AT is the difference between the maximum and minimum temperature. 
a, b and c are the empirical constants determined by using statistical regression technique. 
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3.3.2. Decision tree 

Decision Tree is one of the most popularly used, workable approaches to supervised machine learning. 
A decision tree model can be used in regression as well as a classification problem [30]. It works by breaking 
the data up in a tree like structure into smaller and smaller subsets. It is a peculiar type of probability tree that 
allows us to make a decision about our process. 


4. RESULTS AND DISCUSSION 

The weather parameters can be estimated with better accuracy using ML based models in comparision 
with empirical models. Figure 2 shows the comparision between the estimated and measured solar radiation, 
IMD, Pune values for the location New Delhi using the temperatre based empirical model namely the 
Hargreaves and Samani and Bristow and Campbell models. Figure 3 shows the comparision between the 
estimated and measured solar radiation, IMD, Pune values for the location Trivandrum and Chennai using the 
ML model. 


Hargreaves and Samani model Bristow and Campbell model 


GSR MJ/m2 
GSR MJ/m2 


0 2 4 6 8 10 12 0 2 4 6 8 10 12 
Months of the year Months of the year 
Solar radiation—New Delhi Solar radiation-New Delhi 


Figure 2. Soalr radiation estimation using Hargreaves and Samani model and Bristow and Campbell model 


Months of the year 


Solar radiation—Trivandrum 


0 2 4 6 8 10 12 
Months of the year 
Solar radiation-Chennai 


Figure 3. Solar radiation estimation using decision tree ML model 


Machine learning based smart weather prediction (Rajasekaran Meenal) 


512 o ISSN: 2502-4752 


Figure 4 shows the predicted weather parameters namely % relative humidity and wind speed using 
ML models for the locations Bangalore and New Delhi.The performance metrics used to validate the proposed 
ML models are mean squared error (MSE) and the correlation coefficient. MSE which is the mean of squared 
error occurred between the estimated values and real recorded values. A low error value is desired to get the 
accurate model. For better prediction, correlation coefficient should approach to unity as close as possible. 
From the performance metrics Table 3, it is concluded that the weather parameters can be predicted with better 
accuracy using ML based models in comparision with empirical models. 
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Figure 4. Predicted Weather parameters namely % relative humidity and wind speed using ML models 


Table 3. Performance metrics 


S. No Hargreaves and Bristow and Campbell Decision tree regression 
Samani model model ML model 
MSE 1.5272 1.5423 0.1397 
R-Correlation coefficient 0.8921 0.8874 0.9259 


5. CONCLUSION 

In the presented research work, weather parameters namely rainfall, wind speed, solar radiation and 
relative humidiy are predicted using the machine learning algorithms such as linear regression and decision 
tree model. Also the solar radiation for few selected Indian locations were estimated using the conventional 
temperature based empirical models namely the Hargreaves and samani and Bristow and Campbell model. The 
empirical and machine learning models are validated using the recorded experimental values obtained from 
IMD, Pune. From the results, it is proved that ML based results models performed better in comparision with 
empirical modelswith correlation coefficient ‘R’ value of 0.9259 and MSE of 0.1397. Thus the research work 
has arrived at an optimized end result with a better weather prediction with lesser computational effort. 
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Table 1. Literature survey of weather prediction models 


S. No 


Title/Research Area 


Input Parameters 


Methods and Performance Metrics 


Prediction of Wind Speed using Artificial 
Neural Network model [5] 


Wind speed and direction, Air 
temperature, Air humidity, Air 
pressure. 


ANN model 
Mean Square Error-2.003, 
R*0.951 


2 Prediction of Rainfall Using ANN models Temperature, Cloud cover, Artificial neural networks (ANN)- 
[6] Vapor pressure andPrecipitation. Feed forward neural network 
(FFNN). 
3 Artificial Intelligence based Weather Pressure MLTR model (MULTI-TARGET 
monitoring-Anandharajan et al. [7] Temperature REGRESSION MODEL) 
Dew points wind speed LSTM 
Precipitation Epochs-40 
Accuracy-87.01, RMSE-0.35, 
Losses-0.1274, Learning rate-0.5 
4 Wind Speed Prediction- Input-Nine meteorological Radial Basis Neural Network (T- 
Salisu Muhammad Lawan (ELTA) 2018. [9] parameters RBNN) 
Output-Wind speed. RMSE-7.18 % 
Covariance-0.0098 
J Data Mining Technique based weather Temperature, Humidity Hybrid model, Linear regression 
forecasting model [10] Wind direction, Wind speed model and Data mining based 
Atmospheric condition predictive model 
RMSE-3.0-4.45 
6 Random forest Machine Learning model for Minimum and maximum Random forest and mathematical 
weather prediction. [12] temperature, relative humidity models 
7 Rainfall Prediction using Machine Learning, High temperature, Low temperature, MLR (multiple linear 
Grace and Suganya [13] Humidity. regression) 
8 Machine Learning based Heuristic Rate of rainfall in previous years. Linear regression 
Prediction of Rainfall [14] Mean and Standard deviation. 
9 SVM based Atmospheric temperature Temperature, Humidity, Wind Linear regression method 
prediction-Radhika et al. [15] direction Random forest regression (RFR) 
Atmospheric pressure, Atmospheric Incorporated regression techniques 
condition SVR, MLPR, ETR. 
10 Weather forecasting using hybrid neural Maximum and minimum Recurrent Neural Network Model 
model-Saba et al. [20] temperature of the day, humidity RNN WITH RELU: 
Epochs-40, Accuracy-86.44, 
RMSE-0.76, Losses-0.5824, 
Learning rate-0.9 
RNN WITH SILU: 
Epochs-40, Accuracy-86.91, 
RMSE-0.76, Losses-0.5769, 
Learning rate-0.75 
11 A deep hybrid model for weather Dry temperature, Wet temperature, ANN model-Gaussian constant 
forecasting-Aditya et al. [21] Wind speed, Humidity, Pressure and and hyperbolic tangent-range 
Sunshine [0.1,0.9] 
12 Rainfall Prediction using Multiple Linear Temperature, wind speed and dew Multiple linear regressions. 
Regressions Model [22] point. 
13 Machine Learning & Deep Learning Minimum temperature, Maximum ARIMA model, 
Techniques for rainfall prediction [26]. temperature. Artificial neural network, 
Support Vector Machine 
MSE and RMSE 
14 ANN based wind speed Forecasting-Babu Temperature ARMA model, ANN-BPN model, 
[27] Humidity ANN-GRNN model, ANN-RBFN 
Pressure mode. 
BPN-Mean Square Error-0.195 
Mean Absolute Error-0.302 
GRNN-Mean Square Error-0.023 
Mean Absolute Error-0.041 
RBFN-Mean Square Error-0.009 
Mean Absolute Error-0.022 
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